Implementation
Implementation Details
Section titled “Implementation Details”Technology Stack
Section titled “Technology Stack”- Language: Python 3.9+
- Package Manager: uv (unified Python package manager)
- Dependencies:
- numpy, pandas, polars (data processing)
- matplotlib, seaborn (visualization)
- requests (dataset downloading)
- ipykernel (Jupyter notebook support)
- Environment: Jupyter notebooks for exploration and analysis
Project Structure
Section titled “Project Structure”Directorycos-781/
Directoryexploration/ - Main implementation and exploration
- apriori.py - Traditional Apriori implementation
- fp_growth.py - FP-Growth algorithm implementation
- improved_apriori.py - Improved Apriori implementation (skeleton)
- dataset_loader.py - Dataset downloading and caching utilities
- preprocessing.py - Data preprocessing and transaction utilities
- exploration.ipynb - Main Jupyter notebook with experiments
- pyproject.toml - Python project configuration
Directorydocs/ - Documentation website
- …
Directoryreport/ - LaTeX report
- …
- pyproject.toml - Root project configuration
1. Dataset Loader (dataset_loader.py)
Section titled “1. Dataset Loader (dataset_loader.py)”Utilities for downloading and caching datasets:
DatasetCacheclass for managing cached datasets- Automatic caching with MD5-based keys
- Support for JSONL file downloads
- Metadata tracking for cached files
2. Preprocessing (preprocessing.py)
Section titled “2. Preprocessing (preprocessing.py)”Data preprocessing utilities:
filter_verified_purchases()- Filter for verified purchases onlycreate_user_carts()- Group products by user to create transactionsprepare_transactions()- Complete preprocessing pipelineget_transaction_stats()- Dataset statistics
Runtime Tracking
Section titled “Runtime Tracking”All algorithms include comprehensive runtime tracking:
fit_time- Total time for fittingfrequent_itemsets_time- Time to find frequent itemsetsassociation_rules_time- Time to generate association rulesget_runtime_stats()method to retrieve statistics