Skip to content

COS-781: Data Mining

An improved Apriori algorithm for efficient frequent itemset mining

This repository contains the research, code, documentation, and presentation materials for COS-781: Data Mining, focusing on improved algorithms for frequent itemset mining. The project includes complete implementations of:

  • Traditional Apriori Algorithm - Complete implementation with runtime tracking
  • FP-Growth Algorithm - Full implementation with tree-based pattern mining
  • Improved Apriori Algorithm - Weighted Apriori with intersection-based counting optimization

The implementation includes comprehensive benchmarking, visualization, and comparison capabilities to analyze algorithm performance on real-world datasets.

Research Overview

Learn about the research topic, objectives, and motivation behind improving the Apriori algorithm.

Algorithm Documentation

Detailed documentation of the Apriori algorithm, FP-Growth algorithm, and the proposed improvements.

Exploratory Data Analysis

Comprehensive data exploration including dataset inspection, visualizations, and insights for frequent itemset mining.

Data Preprocessing

Data preprocessing pipeline covering missing data handling, feature engineering, and filtering strategies.

Implementation

Code documentation and implementation details for the improved algorithm.

Presentation

Access slides and presentation materials for the final presentation.

This monorepo contains:

  • Directorydocs/ Documentation website (this site)
    • Directorysrc/
      • Directorycontent/
        • Directorydocs/
    • astro.config.mjs
    • package.json
  • Directoryexploration/ Main implementation and experimentation
    • apriori.py - Traditional Apriori implementation
    • fp_growth.py - FP-Growth algorithm implementation
    • improved_apriori.py - Improved Apriori with intersection-based counting
    • dataset_loader.py - Dataset downloading and caching
    • preprocessing.py - Data preprocessing utilities
    • exploration.ipynb - Main Jupyter notebook with experiments
    • pyproject.toml
  • package.json
  • pyproject.toml
  • README.md
  • Complete Algorithm Implementations - Apriori and FP-Growth with full functionality
  • Runtime Tracking - Comprehensive benchmarking capabilities
  • Data Preprocessing - Automated dataset loading and transaction preparation
  • Visualization - Rich visualizations for algorithm analysis
  • Comparison Framework - Side-by-side algorithm comparison
  • Programmatic Support Calculation - Intelligent minimum support threshold calculation