Improved Apriori Algorithm
Improved Apriori Algorithm
Section titled “Improved Apriori Algorithm”Overview
Section titled “Overview”The Improved Apriori algorithm (also known as the Weighted Apriori Algorithm) is an optimized version of the traditional Apriori algorithm that eliminates repeated database scans by using intersection-based counting. The key innovation is calculating support for candidate itemsets by intersecting the support transactions of their K-order subsets, ensuring the database is scanned only once.
Key Innovation: Single Database Scan
Section titled “Key Innovation: Single Database Scan”The primary enhancement involves calculating the support of candidate itemsets $C_k$ based on the support transactions of their K-order subsets. This ensures the database is scanned only once, dramatically improving efficiency compared to the traditional Apriori algorithm.
Method
Section titled “Method”-
Initial Scan and $L_1$ Generation (Single Database Scan Phase)
- Scan DB once to find initial support transactions for all items
- Generate candidate 1-itemsets ($C_1$)
- Filter $C_1$ to find $L_1$ based on minimum weighted support
-
Iterative Generation and Pruning
- While $L_{k-1}$ is not empty:
- Generate candidate k-itemsets $C_k$ from $L_{k-1}$ using
apriori_gen - Calculate support for candidates by intersecting K-order subset support transactions
- Filter $C_k$ to find frequent k-itemsets $L_k$
- Update the total frequent set $F$ and increment $k$
- Generate candidate k-itemsets $C_k$ from $L_{k-1}$ using
- While $L_{k-1}$ is not empty:
-
Return $F$
Optimization: Intersection-Based Counting
Section titled “Optimization: Intersection-Based Counting”The algorithm’s key optimization is in Step 4, where the count of a candidate itemset $C$ is found by determining the intersection of support transactions in all K-order subset support affairs. This process makes it equivalent to finding the support $K+1$ without needing to re-scan the source database multiple times.
How It Works
Section titled “How It Works”For a candidate k-itemset $C$:
- Generate all (k-1)-subsets of $C$
- Retrieve the support transactions (transaction IDs) for each (k-1)-subset
- Compute the intersection of all these support transaction sets
- The size of the intersection gives the support count for $C$
Related Algorithms
Section titled “Related Algorithms”- Traditional Apriori Algorithm - Standard implementation with multiple database scans
- FP-Growth Algorithm - Tree-based approach avoiding candidate generation