Skip to content

Implementation

  • Language: Python 3.9+
  • Package Manager: uv (unified Python package manager)
  • Dependencies:
    • numpy, pandas, polars (data processing)
    • matplotlib, seaborn (visualization)
    • requests (dataset downloading)
    • ipykernel (Jupyter notebook support)
  • Environment: Jupyter notebooks for exploration and analysis
  • Directorycos-781/
    • Directoryexploration/ - Main implementation and exploration
      • apriori.py - Traditional Apriori implementation
      • fp_growth.py - FP-Growth algorithm implementation
      • improved_apriori.py - Improved Apriori implementation (skeleton)
      • dataset_loader.py - Dataset downloading and caching utilities
      • preprocessing.py - Data preprocessing and transaction utilities
      • exploration.ipynb - Main Jupyter notebook with experiments
      • pyproject.toml - Python project configuration
    • Directorydocs/ - Documentation website
    • Directoryreport/ - LaTeX report
    • pyproject.toml - Root project configuration

Utilities for downloading and caching datasets:

  • DatasetCache class for managing cached datasets
  • Automatic caching with MD5-based keys
  • Support for JSONL file downloads
  • Metadata tracking for cached files

Data preprocessing utilities:

  • filter_verified_purchases() - Filter for verified purchases only
  • create_user_carts() - Group products by user to create transactions
  • prepare_transactions() - Complete preprocessing pipeline
  • get_transaction_stats() - Dataset statistics

All algorithms include comprehensive runtime tracking:

  • fit_time - Total time for fitting
  • frequent_itemsets_time - Time to find frequent itemsets
  • association_rules_time - Time to generate association rules
  • get_runtime_stats() method to retrieve statistics