Improved Apriori Algorithm

Overview

The Improved Apriori algorithm (also known as the Weighted Apriori Algorithm) is an optimized version of the traditional Apriori algorithm that eliminates repeated database scans by using intersection-based counting. The key innovation is calculating support for candidate itemsets by intersecting the support transactions of their K-order subsets, ensuring the database is scanned only once.

Key Innovation: Single Database Scan

The primary enhancement involves calculating the support of candidate itemsets $C_k$ based on the support transactions of their K-order subsets. This ensures the database is scanned only once, dramatically improving efficiency compared to the traditional Apriori algorithm.

Method

Initial Scan and $L_1$ Generation (Single Database Scan Phase)
- Scan DB once to find initial support transactions for all items
- Generate candidate 1-itemsets ($C_1$)
- Filter $C_1$ to find $L_1$ based on minimum weighted support
Iterative Generation and Pruning
- While $L_{k-1}$ is not empty:
  - Generate candidate k-itemsets $C_k$ from $L_{k-1}$ using apriori_gen
  - Calculate support for candidates by intersecting K-order subset support transactions
  - Filter $C_k$ to find frequent k-itemsets $L_k$
  - Update the total frequent set $F$ and increment $k$
Return $F$

Optimization: Intersection-Based Counting

The algorithm’s key optimization is in Step 4, where the count of a candidate itemset $C$ is found by determining the intersection of support transactions in all K-order subset support affairs. This process makes it equivalent to finding the support $K+1$ without needing to re-scan the source database multiple times.

How It Works

For a candidate k-itemset $C$:

Generate all (k-1)-subsets of $C$
Retrieve the support transactions (transaction IDs) for each (k-1)-subset
Compute the intersection of all these support transaction sets
The size of the intersection gives the support count for $C$

Traditional Apriori Algorithm - Standard implementation with multiple database scans
FP-Growth Algorithm - Tree-based approach avoiding candidate generation

Improved Apriori Algorithm