Association Rule¶
1. Introduction¶
Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is commonly used in market basket analysis to identify sets of products that frequently co-occur in transactions.
2. Key Concepts¶
2.1 Itemset¶
- Itemset: A collection of one or more items.
- Frequent Itemset: An itemset that appears frequently in the dataset.
2.2 Support¶
- Definition: Support of an itemset is the proportion of transactions in the dataset in which the itemset appears.
- Formula:
[
\text{Support}(A) = \frac{\text{Number of transactions containing } A}{\text{Total number of transactions}}
]
2.3 Confidence¶
- Definition: Confidence of a rule is the proportion of the transactions containing X that also contain Y.
- Formula:
[
\text{Confidence}(X \rightarrow Y) = \frac{\text{Support}(X \cup Y)}{\text{Support}(X)}
]
2.4 Lift¶
- Definition: Lift is the ratio of the observed support to that expected if X and Y were independent.
- Formula: [ \text{Lift}(X \rightarrow Y) = \frac{\text{Support}(X \cup Y)}{\text{Support}(X) \times \text{Support}(Y)} ]
- Interpretation:
- Lift > 1: Positive correlation between X and Y
- Lift < 1: Negative correlation between X and Y
- Lift = 1: X and Y are independent
2.5 Consequent¶
- Definition: The item(s) found in the right-hand side of the rule.
2.6 Antecedent¶
-
Definition: The item(s) found in the left-hand side of the rule.
Antecedent -> Consequent {x} -> {Y}
3. Algorithm¶
Common algorithms used for mining association rules include:
3.1 Apriori¶
- Apriori: A classic algorithm for frequent itemset mining and association rule learning over transactional databases.
3.2 Eclat¶
- Eclat: Stands for Equivalence Class Clustering and bottom-up Lattice Traversal, used for mining frequent itemsets.
3.3 FP-Growth¶
- FP-Growth: Frequent Pattern Growth algorithm is an efficient and scalable method for mining the complete set of frequent patterns.
4. Metrics¶
- Support Threshold: The minimum support required for an itemset to be considered frequent.
- Confidence Threshold: The minimum confidence required for a rule to be considered strong.
- Lift Threshold: The threshold for determining the strength of correlation between itemsets.
5. Applications¶
- Market Basket Analysis: Identifying products that frequently co-occur in transactions.
- Recommendation Systems: Suggesting products to customers based on their purchase history.
- Fraud Detection: Identifying unusual patterns that may indicate fraudulent activity.
6. Example¶
For a dataset of transactions, an example rule might be:
- Rule: If a customer buys bread, they are likely to buy butter.
- Support: 20% of transactions include both bread and butter.
- Confidence: 70% of transactions that include bread also include butter.
- Lift: The likelihood of buying butter when bread is bought is 1.5 times higher than buying butter independently.
7. Summary¶
Association rule mining is a powerful tool for discovering interesting relationships in large datasets. By understanding and applying key concepts such as support, confidence, and lift, valuable insights can be gained in various domains, from retail to healthcare.