Skip to content

Improved Apriori Algorithm

The Improved Apriori algorithm (also known as the Weighted Apriori Algorithm) is an optimized version of the traditional Apriori algorithm that eliminates repeated database scans by using intersection-based counting. The key innovation is calculating support for candidate itemsets by intersecting the support transactions of their K-order subsets, ensuring the database is scanned only once.

The primary enhancement involves calculating the support of candidate itemsets $C_k$ based on the support transactions of their K-order subsets. This ensures the database is scanned only once, dramatically improving efficiency compared to the traditional Apriori algorithm.

  1. Initial Scan and $L_1$ Generation (Single Database Scan Phase)

    • Scan DB once to find initial support transactions for all items
    • Generate candidate 1-itemsets ($C_1$)
    • Filter $C_1$ to find $L_1$ based on minimum weighted support
  2. Iterative Generation and Pruning

    • While $L_{k-1}$ is not empty:
      • Generate candidate k-itemsets $C_k$ from $L_{k-1}$ using apriori_gen
      • Calculate support for candidates by intersecting K-order subset support transactions
      • Filter $C_k$ to find frequent k-itemsets $L_k$
      • Update the total frequent set $F$ and increment $k$
  3. Return $F$

The algorithm’s key optimization is in Step 4, where the count of a candidate itemset $C$ is found by determining the intersection of support transactions in all K-order subset support affairs. This process makes it equivalent to finding the support $K+1$ without needing to re-scan the source database multiple times.

For a candidate k-itemset $C$:

  1. Generate all (k-1)-subsets of $C$
  2. Retrieve the support transactions (transaction IDs) for each (k-1)-subset
  3. Compute the intersection of all these support transaction sets
  4. The size of the intersection gives the support count for $C$